Hybrid Morphological Segmentation for Phrase-Based Machine Translation
نویسندگان
چکیده
This article describes the Aalto University entry to the English-to-Finnish news translation shared task in WMT 2016. Our segmentation method combines the strengths of rule-based and unsupervised morphology. We also attempt to correct errors in the boundary markings by post-processing with a neural morph boundary predictor.
منابع مشابه
The JHU Machine Translation Systems for WMT 2016
This paper describes the submission of Johns Hopkins University for the shared translation task of ACL 2016 First Conference on Machine Translation (WMT 2016). We set up phrase-based, hierarchical phrase-based and syntax-based systems for all 12 language pairs of this year’s evaluation campaign. Novel research directions we investigated include: neural probabilistic language models, bilingual n...
متن کاملEvaluating Syntax-Driven Approaches to Phrase Extraction for MT
In this paper, we examine a number of different phrase segmentation approaches for Machine Translation and how they perform when used to supplement the translation model of a phrase-based SMT system. This work represents a summary of a number of years of research carried out at Dublin City University in which it has been found that improvements can be made using hybrid translation models. Howev...
متن کاملSyntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabi...
متن کاملPhrase-based Statistical Machine Translation between English and Welsh
This paper shows how a baseline phrase-based statistical machine translation (SMT) system can be set up for translation between English and Welsh, a UK language spoken by about 610,000 people, using well-documented and freely available tools and techniques. Our results indicate that the achievable performance for this language pair is among the better of those European languages reported in Koe...
متن کاملUsing collocation segmentation to extract translation units in a phrase-based statistical machine translation system
This report evaluates the impact of using a novel collocation segmentation method for phrase extraction in the standard phrase-based statistical machine translation approach. The collocation segmentation technique is implemented simultaneously in the source and target side. The resulting collocation segmentation is used to extract translation units. Experiments are reported in the Spanish-toEng...
متن کامل